Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 494778 |
| Missing cells | 1147937 |
| Missing cells (%) | 12.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 71.7 MiB |
| Average record size in memory | 152.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Categorical | 8 |
| Unsupported | 1 |
Date has a high cardinality: 577 distinct values | High cardinality |
Sales is highly correlated with Customers and 1 other fields | High correlation |
DayOfWeek is highly correlated with Open | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with Sales and 2 other fields | High correlation |
Sales is highly correlated with Customers and 1 other fields | High correlation |
DayOfWeek is highly correlated with Open | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with Sales and 2 other fields | High correlation |
Sales is highly correlated with Customers and 1 other fields | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with Sales and 1 other fields | High correlation |
Promo2 is highly correlated with PromoInterval | High correlation |
StoreType is highly correlated with Assortment | High correlation |
PromoInterval is highly correlated with Promo2 | High correlation |
Assortment is highly correlated with StoreType | High correlation |
df_index is highly correlated with Store | High correlation |
Sales is highly correlated with DayOfWeek and 3 other fields | High correlation |
Store is highly correlated with df_index | High correlation |
StoreType is highly correlated with Assortment | High correlation |
Assortment is highly correlated with StoreType and 1 other fields | High correlation |
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeek | High correlation |
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fields | High correlation |
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fields | High correlation |
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fields | High correlation |
DayOfWeek is highly correlated with Sales and 1 other fields | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with Sales and 1 other fields | High correlation |
Promo is highly correlated with Sales | High correlation |
Sales has 14813 (3.0%) missing values | Missing |
CompetitionOpenSinceMonth has 157257 (31.8%) missing values | Missing |
CompetitionOpenSinceYear has 157257 (31.8%) missing values | Missing |
Promo2SinceWeek has 242663 (49.0%) missing values | Missing |
Promo2SinceYear has 242663 (49.0%) missing values | Missing |
PromoInterval has 242663 (49.0%) missing values | Missing |
DayOfWeek has 14838 (3.0%) missing values | Missing |
Customers has 14900 (3.0%) missing values | Missing |
Open has 14775 (3.0%) missing values | Missing |
Promo has 14915 (3.0%) missing values | Missing |
StateHoliday has 14837 (3.0%) missing values | Missing |
SchoolHoliday has 15033 (3.0%) missing values | Missing |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
StateHoliday is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Sales has 81986 (16.6%) zeros | Zeros |
Customers has 82004 (16.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-11-07 10:16:35.622372 |
|---|---|
| Analysis finished | 2021-11-07 10:18:18.833957 |
| Duration | 1 minute and 43.21 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 494778 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 309093.2235 |
| Minimum | 1 |
|---|---|
| Maximum | 618471 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 30851.7 |
| Q1 | 154414.5 |
| median | 309105.5 |
| Q3 | 463615.75 |
| 95-th percentile | 587360.15 |
| Maximum | 618471 |
| Range | 618470 |
| Interquartile range (IQR) | 309201.25 |
Descriptive statistics
| Standard deviation | 178489.2561 |
|---|---|
| Coefficient of variation (CV) | 0.5774609163 |
| Kurtosis | -1.199783755 |
| Mean | 309093.2235 |
| Median Absolute Deviation (MAD) | 154599 |
| Skewness | 0.0004499433344 |
| Sum | 1.529325269 × 1011 |
| Variance | 3.185841453 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2049 | 1 | < 0.1% |
| 320457 | 1 | < 0.1% |
| 492453 | 1 | < 0.1% |
| 494500 | 1 | < 0.1% |
| 504739 | 1 | < 0.1% |
| 506786 | 1 | < 0.1% |
| 500641 | 1 | < 0.1% |
| 502688 | 1 | < 0.1% |
| 414623 | 1 | < 0.1% |
| 416670 | 1 | < 0.1% |
| Other values (494768) | 494768 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 |
| Value | Count | Frequency (%) |
| 618471 | 1 | |
| 618470 | 1 | |
| 618469 | 1 | |
| 618468 | 1 | |
| 618467 | 1 | |
| 618466 | 1 | |
| 618465 | 1 | |
| 618463 | 1 | |
| 618462 | 1 | |
| 618461 | 1 |
Sales
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSINGZEROS| Distinct | 19017 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 14813 |
| Missing (%) | 3.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5666.719767 |
| Minimum | 0 |
|---|---|
| Maximum | 38037 |
| Zeros | 81986 |
| Zeros (%) | 16.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3638 |
| median | 5626 |
| Q3 | 7714 |
| 95-th percentile | 11947 |
| Maximum | 38037 |
| Range | 38037 |
| Interquartile range (IQR) | 4076 |
Descriptive statistics
| Standard deviation | 3805.148502 |
|---|---|
| Coefficient of variation (CV) | 0.6714905021 |
| Kurtosis | 1.914747986 |
| Mean | 5666.719767 |
| Median Absolute Deviation (MAD) | 2042 |
| Skewness | 0.6777930813 |
| Sum | 2719827153 |
| Variance | 14479155.12 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 81986 | 16.6% |
| 5680 | 102 | < 0.1% |
| 5483 | 101 | < 0.1% |
| 6049 | 100 | < 0.1% |
| 5200 | 98 | < 0.1% |
| 5197 | 98 | < 0.1% |
| 5194 | 96 | < 0.1% |
| 5636 | 96 | < 0.1% |
| 5489 | 96 | < 0.1% |
| 4828 | 96 | < 0.1% |
| Other values (19007) | 397096 | |
| (Missing) | 14813 | 3.0% |
| Value | Count | Frequency (%) |
| 0 | 81986 | |
| 133 | 1 | < 0.1% |
| 286 | 1 | < 0.1% |
| 297 | 1 | < 0.1% |
| 416 | 1 | < 0.1% |
| 506 | 1 | < 0.1% |
| 538 | 1 | < 0.1% |
| 541 | 1 | < 0.1% |
| 552 | 1 | < 0.1% |
| 555 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 38037 | 1 | |
| 37403 | 1 | |
| 35909 | 1 | |
| 35350 | 1 | |
| 34904 | 1 | |
| 34814 | 1 | |
| 34692 | 2 | |
| 34475 | 1 | |
| 34369 | 1 | |
| 34001 | 1 |
| Distinct | 1115 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 557.8484512 |
| Minimum | 1 |
|---|---|
| Maximum | 1115 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 56 |
| Q1 | 279 |
| median | 558 |
| Q3 | 836 |
| 95-th percentile | 1059 |
| Maximum | 1115 |
| Range | 1114 |
| Interquartile range (IQR) | 557 |
Descriptive statistics
| Standard deviation | 321.7982629 |
|---|---|
| Coefficient of variation (CV) | 0.5768560659 |
| Kurtosis | -1.19993654 |
| Mean | 557.8484512 |
| Median Absolute Deviation (MAD) | 279 |
| Skewness | 0.0001858995185 |
| Sum | 276011141 |
| Variance | 103554.122 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 773 | 479 | 0.1% |
| 763 | 478 | 0.1% |
| 898 | 474 | 0.1% |
| 166 | 473 | 0.1% |
| 486 | 472 | 0.1% |
| 219 | 472 | 0.1% |
| 393 | 472 | 0.1% |
| 793 | 471 | 0.1% |
| 374 | 470 | 0.1% |
| 442 | 470 | 0.1% |
| Other values (1105) | 490047 |
| Value | Count | Frequency (%) |
| 1 | 464 | |
| 2 | 456 | |
| 3 | 444 | |
| 4 | 446 | |
| 5 | 463 | |
| 6 | 451 | |
| 7 | 459 | |
| 8 | 441 | |
| 9 | 430 | |
| 10 | 447 |
| Value | Count | Frequency (%) |
| 1115 | 447 | |
| 1114 | 431 | |
| 1113 | 457 | |
| 1112 | 443 | |
| 1111 | 456 | |
| 1110 | 443 | |
| 1109 | 445 | |
| 1108 | 444 | |
| 1107 | 405 | |
| 1106 | 437 |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 MiB |
| a | |
|---|---|
| d | |
| c | |
| b | 7562 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | a |
|---|---|
| 2nd row | a |
| 3rd row | a |
| 4th row | d |
| 5th row | b |
Common Values
| Value | Count | Frequency (%) |
| a | 267306 | |
| d | 153920 | |
| c | 65990 | 13.3% |
| b | 7562 | 1.5% |
Length
Pie chart
| Value | Count | Frequency (%) |
| a | 267306 | |
| d | 153920 | |
| c | 65990 | 13.3% |
| b | 7562 | 1.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 MiB |
| a | |
|---|---|
| c | |
| b | 4022 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | c |
|---|---|
| 2nd row | c |
| 3rd row | a |
| 4th row | c |
| 5th row | b |
Common Values
| Value | Count | Frequency (%) |
| a | 262625 | |
| c | 228131 | |
| b | 4022 | 0.8% |
Length
Pie chart
| Value | Count | Frequency (%) |
| a | 262625 | |
| c | 228131 | |
| b | 4022 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
CompetitionDistance
Real number (ℝ≥0)
| Distinct | 654 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1323 |
| Missing (%) | 0.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5409.264553 |
| Minimum | 20 |
|---|---|
| Maximum | 75860 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 140 |
| Q1 | 710 |
| median | 2320 |
| Q3 | 6880 |
| 95-th percentile | 20260 |
| Maximum | 75860 |
| Range | 75840 |
| Interquartile range (IQR) | 6170 |
Descriptive statistics
| Standard deviation | 7674.370685 |
|---|---|
| Coefficient of variation (CV) | 1.418745674 |
| Kurtosis | 13.00397241 |
| Mean | 5409.264553 |
| Median Absolute Deviation (MAD) | 1970 |
| Skewness | 2.924543913 |
| Sum | 2669228640 |
| Variance | 58895965.42 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 250 | 5374 | 1.1% |
| 1200 | 3886 | 0.8% |
| 50 | 3587 | 0.7% |
| 350 | 3586 | 0.7% |
| 190 | 3572 | 0.7% |
| 330 | 3138 | 0.6% |
| 180 | 3130 | 0.6% |
| 150 | 3092 | 0.6% |
| 90 | 3082 | 0.6% |
| 1070 | 2725 | 0.6% |
| Other values (644) | 458283 |
| Value | Count | Frequency (%) |
| 20 | 440 | 0.1% |
| 30 | 1788 | |
| 40 | 2238 | |
| 50 | 3587 | |
| 60 | 1359 | 0.3% |
| 70 | 2260 | |
| 80 | 1333 | 0.3% |
| 90 | 3082 | |
| 100 | 2279 | |
| 110 | 2636 |
| Value | Count | Frequency (%) |
| 75860 | 447 | |
| 58260 | 444 | |
| 48330 | 451 | |
| 46590 | 455 | |
| 45740 | 427 | |
| 44320 | 454 | |
| 40860 | 442 | |
| 40540 | 445 | |
| 38710 | 455 | |
| 38630 | 444 |
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 157257 |
| Missing (%) | 31.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.226048157 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 4 |
| median | 8 |
| Q3 | 10 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.211344325 |
|---|---|
| Coefficient of variation (CV) | 0.4444122506 |
| Kurtosis | -1.243925006 |
| Mean | 7.226048157 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.1715032332 |
| Sum | 2438943 |
| Variance | 10.31273237 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 55551 | 11.2% |
| 4 | 41762 | 8.4% |
| 11 | 40962 | 8.3% |
| 3 | 31055 | 6.3% |
| 7 | 29457 | 6.0% |
| 12 | 28335 | 5.7% |
| 10 | 27053 | 5.5% |
| 6 | 22204 | 4.5% |
| 5 | 19378 | 3.9% |
| 2 | 18266 | 3.7% |
| Other values (2) | 23498 | 4.7% |
| (Missing) | 157257 |
| Value | Count | Frequency (%) |
| 1 | 6170 | 1.2% |
| 2 | 18266 | 3.7% |
| 3 | 31055 | |
| 4 | 41762 | |
| 5 | 19378 | 3.9% |
| 6 | 22204 | 4.5% |
| 7 | 29457 | |
| 8 | 17328 | 3.5% |
| 9 | 55551 | |
| 10 | 27053 |
| Value | Count | Frequency (%) |
| 12 | 28335 | |
| 11 | 40962 | |
| 10 | 27053 | |
| 9 | 55551 | |
| 8 | 17328 | 3.5% |
| 7 | 29457 | |
| 6 | 22204 | 4.5% |
| 5 | 19378 | 3.9% |
| 4 | 41762 | |
| 3 | 31055 |
| Distinct | 23 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 157257 |
| Missing (%) | 31.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2008.674189 |
| Minimum | 1900 |
|---|---|
| Maximum | 2015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1900 |
|---|---|
| 5-th percentile | 2001 |
| Q1 | 2006 |
| median | 2010 |
| Q3 | 2013 |
| 95-th percentile | 2014 |
| Maximum | 2015 |
| Range | 115 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 6.156380181 |
|---|---|
| Coefficient of variation (CV) | 0.003064897341 |
| Kurtosis | 125.8876054 |
| Mean | 2008.674189 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -7.89739704 |
| Sum | 677969721 |
| Variance | 37.90101693 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2013 | 36852 | 7.4% |
| 2012 | 36309 | 7.3% |
| 2014 | 31055 | 6.3% |
| 2005 | 27533 | 5.6% |
| 2010 | 24534 | 5.0% |
| 2011 | 24069 | 4.9% |
| 2009 | 23971 | 4.8% |
| 2008 | 23793 | 4.8% |
| 2007 | 21245 | 4.3% |
| 2006 | 20896 | 4.2% |
| Other values (13) | 67264 | |
| (Missing) | 157257 |
| Value | Count | Frequency (%) |
| 1900 | 427 | 0.1% |
| 1961 | 462 | 0.1% |
| 1990 | 2252 | 0.5% |
| 1994 | 882 | 0.2% |
| 1995 | 889 | 0.2% |
| 1998 | 441 | 0.1% |
| 1999 | 3548 | 0.7% |
| 2000 | 4419 | 0.9% |
| 2001 | 7043 | |
| 2002 | 11994 |
| Value | Count | Frequency (%) |
| 2015 | 16844 | |
| 2014 | 31055 | |
| 2013 | 36852 | |
| 2012 | 36309 | |
| 2011 | 24069 | |
| 2010 | 24534 | |
| 2009 | 23971 | |
| 2008 | 23793 | |
| 2007 | 21245 | |
| 2006 | 20896 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 MiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 252115 | |
| 0 | 242663 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 1 | 252115 | |
| 0 | 242663 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 24 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 242663 |
| Missing (%) | 49.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.51320627 |
| Minimum | 1 |
|---|---|
| Maximum | 50 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 13 |
| median | 22 |
| Q3 | 37 |
| 95-th percentile | 45 |
| Maximum | 50 |
| Range | 49 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 14.12903756 |
|---|---|
| Coefficient of variation (CV) | 0.6008979547 |
| Kurtosis | -1.383594275 |
| Mean | 23.51320627 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 0.08225705468 |
| Sum | 5928032 |
| Variance | 199.6297024 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 14 | 35930 | 7.3% |
| 40 | 33099 | 6.7% |
| 31 | 19578 | 4.0% |
| 10 | 18688 | 3.8% |
| 5 | 17406 | 3.5% |
| 37 | 15627 | 3.2% |
| 1 | 15611 | 3.2% |
| 13 | 14958 | 3.0% |
| 45 | 14936 | 3.0% |
| 22 | 14374 | 2.9% |
| Other values (14) | 51908 | 10.5% |
| (Missing) | 242663 |
| Value | Count | Frequency (%) |
| 1 | 15611 | |
| 5 | 17406 | |
| 6 | 451 | 0.1% |
| 9 | 6161 | 1.2% |
| 10 | 18688 | |
| 13 | 14958 | |
| 14 | 35930 | |
| 18 | 12935 | 2.6% |
| 22 | 14374 | |
| 23 | 2182 | 0.4% |
| Value | Count | Frequency (%) |
| 50 | 469 | 0.1% |
| 49 | 416 | 0.1% |
| 48 | 4052 | 0.8% |
| 45 | 14936 | |
| 44 | 1345 | 0.3% |
| 40 | 33099 | |
| 39 | 2557 | 0.5% |
| 37 | 15627 | |
| 36 | 4436 | 0.9% |
| 35 | 11137 | 2.3% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 242663 |
| Missing (%) | 49.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2011.758249 |
| Minimum | 2009 |
|---|---|
| Maximum | 2015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 2009 |
|---|---|
| 5-th percentile | 2009 |
| Q1 | 2011 |
| median | 2012 |
| Q3 | 2013 |
| 95-th percentile | 2014 |
| Maximum | 2015 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.670213506 |
|---|---|
| Coefficient of variation (CV) | 0.0008302257524 |
| Kurtosis | -1.057709888 |
| Mean | 2011.758249 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.1188747565 |
| Sum | 507194431 |
| Variance | 2.789613156 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2011 | 56648 | 11.4% |
| 2013 | 53489 | 10.8% |
| 2014 | 41202 | 8.3% |
| 2012 | 35876 | 7.3% |
| 2009 | 32314 | 6.5% |
| 2010 | 28202 | 5.7% |
| 2015 | 4384 | 0.9% |
| (Missing) | 242663 |
| Value | Count | Frequency (%) |
| 2009 | 32314 | |
| 2010 | 28202 | |
| 2011 | 56648 | |
| 2012 | 35876 | |
| 2013 | 53489 | |
| 2014 | 41202 | |
| 2015 | 4384 | 0.9% |
| Value | Count | Frequency (%) |
| 2015 | 4384 | 0.9% |
| 2014 | 41202 | |
| 2013 | 53489 | |
| 2012 | 35876 | |
| 2011 | 56648 | |
| 2010 | 28202 | |
| 2009 | 32314 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 242663 |
| Missing (%) | 49.0% |
| Memory size | 3.8 MiB |
| Jan,Apr,Jul,Oct | |
|---|---|
| Feb,May,Aug,Nov | |
| Mar,Jun,Sept,Dec |
Length
| Max length | 16 |
|---|---|
| Median length | 15 |
| Mean length | 15.18682744 |
| Min length | 15 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Feb,May,Aug,Nov |
|---|---|
| 2nd row | Jan,Apr,Jul,Oct |
| 3rd row | Feb,May,Aug,Nov |
| 4th row | Feb,May,Aug,Nov |
| 5th row | Feb,May,Aug,Nov |
Common Values
| Value | Count | Frequency (%) |
| Jan,Apr,Jul,Oct | 147111 | |
| Feb,May,Aug,Nov | 57902 | 11.7% |
| Mar,Jun,Sept,Dec | 47102 | 9.5% |
| (Missing) | 242663 |
Length
Pie chart
| Value | Count | Frequency (%) |
| jan,apr,jul,oct | 147111 | |
| feb,may,aug,nov | 57902 | 23.0% |
| mar,jun,sept,dec | 47102 | 18.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 577 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 MiB |
| 2013-02-12 | 905 |
|---|---|
| 2014-05-09 | 902 |
| 2013-01-14 | 901 |
| 2013-07-28 | 900 |
| 2013-05-13 | 899 |
| Other values (572) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2014-02-02 |
|---|---|
| 2nd row | 2013-07-19 |
| 3rd row | 2013-11-24 |
| 4th row | 2013-03-01 |
| 5th row | 2013-12-17 |
Common Values
| Value | Count | Frequency (%) |
| 2013-02-12 | 905 | 0.2% |
| 2014-05-09 | 902 | 0.2% |
| 2013-01-14 | 901 | 0.2% |
| 2013-07-28 | 900 | 0.2% |
| 2013-05-13 | 899 | 0.2% |
| 2013-09-16 | 899 | 0.2% |
| 2013-06-14 | 898 | 0.2% |
| 2014-03-05 | 897 | 0.2% |
| 2013-08-01 | 896 | 0.2% |
| 2013-01-10 | 896 | 0.2% |
| Other values (567) | 485785 |
Length
| Value | Count | Frequency (%) |
| 2013-02-12 | 905 | 0.2% |
| 2014-05-09 | 902 | 0.2% |
| 2013-01-14 | 901 | 0.2% |
| 2013-07-28 | 900 | 0.2% |
| 2013-05-13 | 899 | 0.2% |
| 2013-09-16 | 899 | 0.2% |
| 2013-06-14 | 898 | 0.2% |
| 2014-03-05 | 897 | 0.2% |
| 2013-07-12 | 896 | 0.2% |
| 2013-01-10 | 896 | 0.2% |
| Other values (567) | 485785 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 14838 |
| Missing (%) | 3.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.993878401 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.997798516 |
|---|---|
| Coefficient of variation (CV) | 0.5002151582 |
| Kurtosis | -1.247693749 |
| Mean | 3.993878401 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.005397276137 |
| Sum | 1916822 |
| Variance | 3.991198912 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 69077 | |
| 4 | 68851 | |
| 3 | 68809 | |
| 5 | 68433 | |
| 1 | 68422 | |
| 6 | 68186 | |
| 7 | 68162 | |
| (Missing) | 14838 | 3.0% |
| Value | Count | Frequency (%) |
| 1 | 68422 | |
| 2 | 69077 | |
| 3 | 68809 | |
| 4 | 68851 | |
| 5 | 68433 | |
| 6 | 68186 | |
| 7 | 68162 |
| Value | Count | Frequency (%) |
| 7 | 68162 | |
| 6 | 68186 | |
| 5 | 68433 | |
| 4 | 68851 | |
| 3 | 68809 | |
| 2 | 69077 | |
| 1 | 68422 |
Customers
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSINGZEROS| Distinct | 3745 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 14900 |
| Missing (%) | 3.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 628.8059215 |
| Minimum | 0 |
|---|---|
| Maximum | 7388 |
| Zeros | 82004 |
| Zeros (%) | 16.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 398 |
| median | 604 |
| Q3 | 833 |
| 95-th percentile | 1362 |
| Maximum | 7388 |
| Range | 7388 |
| Interquartile range (IQR) | 435 |
Descriptive statistics
| Standard deviation | 463.3450633 |
|---|---|
| Coefficient of variation (CV) | 0.7368649808 |
| Kurtosis | 7.113851144 |
| Mean | 628.8059215 |
| Median Absolute Deviation (MAD) | 218 |
| Skewness | 1.597509155 |
| Sum | 301750128 |
| Variance | 214688.6476 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 82004 | 16.6% |
| 560 | 1147 | 0.2% |
| 555 | 1127 | 0.2% |
| 517 | 1118 | 0.2% |
| 528 | 1106 | 0.2% |
| 582 | 1104 | 0.2% |
| 625 | 1103 | 0.2% |
| 576 | 1100 | 0.2% |
| 603 | 1097 | 0.2% |
| 571 | 1095 | 0.2% |
| Other values (3735) | 387877 | |
| (Missing) | 14900 | 3.0% |
| Value | Count | Frequency (%) |
| 0 | 82004 | |
| 18 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 40 | 1 | < 0.1% |
| 50 | 1 | < 0.1% |
| 60 | 1 | < 0.1% |
| 61 | 1 | < 0.1% |
| 64 | 1 | < 0.1% |
| 68 | 1 | < 0.1% |
| 74 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 7388 | 1 | |
| 5387 | 1 | |
| 5297 | 1 | |
| 5112 | 1 | |
| 5106 | 1 | |
| 5063 | 1 | |
| 5034 | 1 | |
| 5028 | 1 | |
| 5014 | 1 | |
| 5013 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 14775 |
| Missing (%) | 3.0% |
| Memory size | 3.8 MiB |
| 1.0 | |
|---|---|
| 0.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 0.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 398060 | |
| 0.0 | 81943 | 16.6% |
| (Missing) | 14775 | 3.0% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 1.0 | 398060 | |
| 0.0 | 81943 | 17.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 14915 |
| Missing (%) | 3.0% |
| Memory size | 3.8 MiB |
| 0.0 | |
|---|---|
| 1.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 302306 | |
| 1.0 | 177557 | |
| (Missing) | 14915 | 3.0% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0.0 | 302306 | |
| 1.0 | 177557 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 15033 |
| Missing (%) | 3.0% |
| Memory size | 3.8 MiB |
| 0.0 | |
|---|---|
| 1.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 396594 | |
| 1.0 | 83151 | 16.8% |
| (Missing) | 15033 | 3.0% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0.0 | 396594 | |
| 1.0 | 83151 | 17.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | Sales | Store | StoreType | Assortment | CompetitionDistance | CompetitionOpenSinceMonth | CompetitionOpenSinceYear | Promo2 | Promo2SinceWeek | Promo2SinceYear | PromoInterval | Date | DayOfWeek | Customers | Open | Promo | StateHoliday | SchoolHoliday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 290482 | 0.0 | 524 | a | c | 40860.0 | 9.0 | 2013.0 | 0 | NaN | NaN | NaN | 2014-02-02 | 7.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 |
| 1 | 139915 | 6267.0 | 253 | a | c | 250.0 | NaN | NaN | 1 | 5.0 | 2013.0 | Feb,May,Aug,Nov | 2013-07-19 | 5.0 | 749.0 | 1.0 | 1.0 | 0 | 0.0 |
| 2 | 318657 | 0.0 | 575 | a | a | 960.0 | 5.0 | 2008.0 | 1 | 13.0 | 2010.0 | Jan,Apr,Jul,Oct | 2013-11-24 | 7.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 |
| 3 | 361617 | 5900.0 | 653 | d | c | 7520.0 | 7.0 | 2014.0 | 1 | 45.0 | 2009.0 | Feb,May,Aug,Nov | 2013-03-01 | NaN | 548.0 | 1.0 | 0.0 | 0 | 0.0 |
| 4 | 374646 | 8468.0 | 676 | b | b | 1410.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2013-12-17 | 2.0 | 1767.0 | 1.0 | 1.0 | 0 | 0.0 |
| 5 | 166 | 5011.0 | 1 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2013-06-20 | 4.0 | 539.0 | 1.0 | 1.0 | 0 | 0.0 |
| 6 | 194544 | 5437.0 | 351 | a | a | 5290.0 | 11.0 | 2012.0 | 1 | 5.0 | 2013.0 | Feb,May,Aug,Nov | 2014-06-07 | 6.0 | 493.0 | 1.0 | 0.0 | 0 | 0.0 |
| 7 | 503508 | 8226.0 | 909 | a | c | 1680.0 | NaN | NaN | 1 | 45.0 | 2009.0 | Feb,May,Aug,Nov | 2013-01-04 | 5.0 | 876.0 | 1.0 | 0.0 | 0 | 1.0 |
| 8 | 45019 | 8085.0 | 82 | a | a | 22390.0 | 4.0 | 2008.0 | 1 | 37.0 | 2009.0 | Jan,Apr,Jul,Oct | 2013-03-15 | 5.0 | 831.0 | 1.0 | 0.0 | 0 | 0.0 |
| 9 | 29651 | 0.0 | 54 | d | c | 7170.0 | 8.0 | 2014.0 | 1 | 5.0 | 2013.0 | Feb,May,Aug,Nov | 2013-10-06 | 7.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 |
Last rows
| df_index | Sales | Store | StoreType | Assortment | CompetitionDistance | CompetitionOpenSinceMonth | CompetitionOpenSinceYear | Promo2 | Promo2SinceWeek | Promo2SinceYear | PromoInterval | Date | DayOfWeek | Customers | Open | Promo | StateHoliday | SchoolHoliday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 494768 | 175203 | 5078.0 | 317 | d | a | 3140.0 | 7.0 | 2013.0 | 1 | 14.0 | 2011.0 | Jan,Apr,Jul,Oct | 2013-04-05 | 5.0 | 604.0 | 1.0 | 0.0 | 0 | 1.0 |
| 494769 | 87498 | 8277.0 | 158 | d | c | 11840.0 | NaN | NaN | 1 | 31.0 | 2009.0 | Feb,May,Aug,Nov | 2014-06-10 | 2.0 | 522.0 | 1.0 | 0.0 | 0 | 0.0 |
| 494770 | 521430 | 6063.0 | 941 | a | a | 1200.0 | 12.0 | 2011.0 | 1 | 31.0 | 2013.0 | Jan,Apr,Jul,Oct | 2013-06-18 | 2.0 | 651.0 | 1.0 | 1.0 | 0 | 0.0 |
| 494771 | 137337 | 5581.0 | 248 | a | c | 340.0 | 9.0 | 2012.0 | 1 | 40.0 | 2012.0 | Jan,Apr,Jul,Oct | 2014-03-11 | 2.0 | 945.0 | 1.0 | 0.0 | 0.0 | 0.0 |
| 494772 | 54886 | 4860.0 | 99 | c | c | 2030.0 | 11.0 | 2003.0 | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2014-04-26 | NaN | 464.0 | 1.0 | 0.0 | 0 | 0.0 |
| 494773 | 110268 | 0.0 | 200 | a | a | 1650.0 | 10.0 | 2000.0 | 0 | NaN | NaN | NaN | 2013-05-12 | 7.0 | 0.0 | 0.0 | 0.0 | 0 | 0.0 |
| 494774 | 259178 | 4860.0 | 468 | c | c | 5260.0 | 9.0 | 2012.0 | 0 | NaN | NaN | NaN | 2013-02-11 | 1.0 | 603.0 | 1.0 | 0.0 | 0 | 0.0 |
| 494775 | 365838 | 5691.0 | 660 | a | a | 1200.0 | 11.0 | 2006.0 | 1 | 40.0 | 2014.0 | Jan,Apr,Jul,Oct | 2014-01-10 | 5.0 | 618.0 | 1.0 | 1.0 | 0 | 0.0 |
| 494776 | 131932 | 4180.0 | 239 | d | c | 610.0 | NaN | NaN | 0 | NaN | NaN | NaN | 2013-02-23 | 6.0 | 432.0 | 1.0 | 0.0 | 0 | 0.0 |
| 494777 | 121958 | 5931.0 | 221 | d | c | 13530.0 | 9.0 | 2013.0 | 0 | NaN | NaN | NaN | 2013-06-06 | NaN | 590.0 | 1.0 | 1.0 | 0 | 0.0 |